perm filename A74.TEX[254,RWF] blob sn#864883 filedate 1988-11-23 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input macro[1,mps]
C00019 ENDMK
C⊗;
\input macro[1,mps]
\magnification\magstephalf
\baselineskip 14pt
\leftline{\sevenrm A74.Tex[254,mps]\today}
\leftline{\copyright\sevenrm Robert W. Floyd, November 22, 1988}
\leftline{\sevenrm Unpublished draft}
\bigskip
\noindent{\bf The Partition Theorem}

In a CFL with grammar $G$, if $x↓1x↓2\ldots x↓k=x\Rightarrow y$ by
production $A→w$, $A$~must occur in some~$x↓j=uAv$. Define $y↓j=uwv$,
$y↓i=x↓i$ if $i≠j$; we find $x=x↓1x↓2\ldots x↓k\Rightarrow y↓1y↓2
\ldots y↓k=y$. By induction, if $x↓1x↓2\ldots x↓k=x\aRa y$, then $y$~may
be factored into $y↓1y↓2\ldots y↓k$ with $x↓i\aRa y↓i$.

\proclaim
Theorem. If the first string in a derivation is a concatenation of several
substrings, subsequent strings and the entire derivation may be factored
in a corresponding way.

For example, any $x↓i$ may be changed to~$y↓i$ before any other rewriting.

\medskip
\line{\bf Standardizing Context-Free Grammars\hfill}

\smallskip
\disleft 20pt:(1):
Change each production of the form $A→X↓1X↓2\ldots X↓k$ to the set
$A→A↓kX↓k$, 
$A↓k→A↓{k-1}X↓{k-1}$, 
$\ldots\,$,
$A↓2→A↓1X↓1$, 
$A↓1→\Lambda$, 
where each~$A↓i$ is a new variable. The set of terminal strings derivable
from~$A$ remains the same, so the language defined remains the same, but
lengths of right hand sides of productions are $≤2$, and the transformations
below preserve this property.

\smallskip
\disleft 20pt:(2):
Determine which variables derive the empty string. For each production
$A→\alpha B\gamma$ where $B\aRa\Lambda$, include the production $A→\alpha\gamma$,
until no further productions are generated; this does not alter the set of
strings derived from each variable. Now omit all productions with~$\Lambda$
as right side. By induction on length of derivation, if
$A\aRa x\in T↑+$ before this transformation, $A\aRa x$ afterward, but no
longer can $A\aRa\Lambda$. The resultant language is the original with
the possible omission of~$\Lambda$, but no right side is empty, so lengths
in derivations are non-decreasing. Transformation~(2) preserves the 
postcondition of transformation~(1), as it lengthens no productions.

\smallskip
\disleft 20pt::
To determine which variables derive the empty string, initialize $N$ 
to~$\emptyset$, then for each production $A→X\in N↑{\ast}$, include~$A$
in~$N$, until no further change occurs. Induction on length of derivations
shows $A\aRa\Lambda$ iff $A\in N$.

\smallskip
\smallskip
\disleft 20pt:(3):
The following transformation requires as a precondition the postcondition
of transformation~(2), that all right sides of productions be nonempty.
Define $P'$ as $P↑+\circ I↓{\Sigma↑{\ast}-V}$, $G'$~as~$G$ with~$P'$
replacing~$P$. Right sides of~$P'$ are not variables, nor empty. Since
$P'\subseteq\,\aRag$, $L↓{G'}\subseteq L↓G$. By induction on length of
derivation, if $X\aRag u\in T↑+$, $X\in\Sigma$, then $X\aRagp u$. For
the induction step, $X\in V$. Let $y=Y↓1Y↓2\ldots Y↓k$ be the first
line in the derivation not in~$V$; $X\togp y=Y↓1Y↓2\ldots Y↓k\aRag 
u↓1u↓2\ldots u↓k=u$, where by induction $Y↓i\aRag u↓i$ implies
$Y↓i\aRagp u↓i$, so $X\aRagp u$. The language of~$G'$ is the same as that
of~$G$, but right sides of productions are neither variables nor empty.
The postconditions of~(1) and~(2) are preserved.

\smallskip
\disleft 20pt:(4):
For each $a\in T$, introduce a variable $B↓a$ and a production $B↓a→a$;
replace~$a$ with~$B↓a$ in right sides of length $≥2$. The language is
unchanged, but right sides of length $≥2$ belong to~$V↑{\ast}$. The
postconditions of~(1), (2), and~(3) are preserved, so after (1), (2), (3),
and~(4) all right sides belong to~$T$ or~$V↑2$. This form is called
Chomsky Normal Form (CNF).

%\smallskip
\disleft 20pt:(5):
Determine which variables derive some terminal string. Eliminate all variables
that derive no terminal string, and all productions that contain them. This
does not alter the set of terminal derivatives of variables, so the language
is unchanged. If the root symbol is eliminated, the language is empty. Because
no new productions are introduced, the postconditions of (1), (2), (3),
and~(4) are preserved.

\smallskip
\disleft 20pt::
To determine which variables derive a terminal string, initialize $\tau$
to~$\emptyset$. If $A→x\in(\tau\cup T)↑{\ast}$,
insert~$A$ in~$\tau$. When no further insertions are possible, $\tau$~is
the set of variables that derive some terminal string.

\smallskip
\disleft 20pt:(6):
Determine which variables occur in no sentential form (string derived
from~$S$). Omit all such variables, and all productions containing them.
This transformation preserves the postcondition of~(5), because if a
variable~$A$ survives~(6), so does any variable occuring in a derivation
from~$A$ of a terminal string, and so does the entire derivation. After
(5) and~(6), every variable is used in some derivation of a sentence.
Transformation (6) also preserves the postconditions of (1), (2), (3),
and~(4), because it introduces no new productions.

\smallskip
\disleft 20pt::
To determine which variables occur in a sentential form, define relation~$R$
by $XRY$ if $X→\alpha Y\beta$; $S\circ R↑{\ast}$ is the set of variables occurring
in sentential forms.

\smallskip
\disleft 20pt:(7):
If the empty string was eliminated from the language by (2), reintroduce
it with the production $S→\Lambda$.

\smallskip
We conclude:
From any CFG can be constructed another for the same language, in which
all productions are of the forms $V→T$, $V→V↑2$ or $S→\Lambda$, and in
which every variable (except possibly~$S$) may be used in deriving some
sentence. Symbolically,
$$\eqalign{P\subseteq\bigl(V\times(T\cup V\otimes V)\bigr)\cup\{\langle
S,\Lambda\rangle\}\,;\cr
\hbox{If }A\in V,\; S\aRa uAw,\; A\aRa v\,.\cr}$$

\vfill\eject
\line{\bf Greibach's Theorem\hfill}

Let $G$ be a CFG where no right hand side 
is in $\{\Lambda\}\cup V$. We already know
that all CFGs can be put in this form, without altering their languages
except by omitting~$\Lambda$.
Assume $V↓G=\{A↓i\}$. Define a grammar~$G'$, with variables
$V'=\{A↓i\}\cup\{B↓{ij}\}$. Define the productions~$P'$ of~$G'$ as follows:

\smallskip
\disleft 20pt:(1):
For each $A↓i→A↓j\beta$ in $P$, $B↓{kj}→\beta B↓{ki}$ is in~$P'$ for all~$k$.

\smallskip
\disleft 20pt:(2):
For each $A↓i→t\gamma$ in $P$, $A↓k→t\gamma B↓{ki}$ is in $P'$ for all~$k$.

\smallskip
\disleft 20pt:(3):
For all $i$, $B↓{ii}→\Lambda$ is in $P'$.

\smallskip
\disleft 20pt:(4):
That's all.

\noindent
(Subscripts, above, are taken from some finite set.)

\proclaim Theorem. $A↓i\aRag u$ iff $A↓i\aRagp u$.

\vfill\eject

\noindent
{\bf Part 1:} Assume $A↓i\aRag u$. Because $A↓i$ is a variable, and $u$
contains no variables, using the partition theorem we can assume that the
derivation begins by applying productions to the first symbol (at least once)
until it is terminal. Then the first part of the derivation is, say,
$$\vcenter{\halign{\hfil$#$\qquad&$#$\hfil\cr
\hbox{String}&\hbox{Production}\cr
\noalign{\smallskip}
A↓0\cr
&A↓0→A↓1\beta↓1\cr
A↓1\beta↓1\cr
&A↓1→A↓2\beta↓2\cr
A↓2\beta↓2\beta↓1\cr
&A↓2→A↓3\beta↓3\cr
A↓3\beta↓3\beta↓2\beta↓1\cr
&A↓3→A↓4\beta↓4\cr
A↓4\beta↓4\beta↓3\beta↓2\beta↓1\cr
&A↓4→t\gamma\cr
t\gamma\beta↓4\beta↓3\beta↓2\beta↓1\cr
\noalign{\medskip}
\noalign{\hbox{But the same derivation in $G'$ is}\hfil}
A↓0\cr
&A↓0→t\gamma B↓{04}\cr
t\gamma B↓{04}\cr
&B↓{04}→\beta↓4B↓{03}\cr
t\gamma\beta↓4 B↓{03}\cr
&B↓{03}→\beta↓3B↓{02}\cr
t\gamma\beta↓4\beta↓3 B↓{02}\cr
&B↓{02}→\beta↓2B↓{01}\cr
t\gamma\beta↓4\beta↓3\beta↓2 B↓{01}\cr
&B↓{01}→\beta↓1B↓{00}\cr
t\gamma\beta↓4\beta↓3\beta↓2\beta↓1 B↓{00}\cr
&B↓{00}→\Lambda\cr
t\gamma\beta↓4\beta↓3\beta↓2\beta↓1\cr}}$$
and, by the partition theorem and course-of-values induction, if
$t\gamma\beta↓4\beta↓3\beta↓2\beta↓1\aRag u$,
$t\gamma\beta↓4\beta↓3\beta↓2\beta↓1\aRagp u$.

\vfill\eject

\noindent
{\bf Part 2:} Assume $A↓i\aRagp u$. Then the derivation must begin with a
production from part~(2), and by rewriting the rightmost symbol until it
is no longer~a~$B$, we get as a typical derivation:
$$\vcenter{\halign{\hfil$#$\qquad&$#$\hfil\qquad&$#$\hfil\cr
\hbox{String}\quad&\hbox{Production in $G'$}&\hbox{From production in $G$}\cr
\noalign{\smallskip}
A↓0\cr
&A↓0→t\gamma B↓{04}&A↓4→t\gamma\cr
t\gamma B↓{04}\cr
&B↓{04}→\beta↓4B↓{03}&A↓3→A↓4\beta↓4\cr
t\gamma\beta↓4 B↓{03}\cr
&B↓{03}→\beta↓3B↓{02}&A↓2→A↓3\beta↓3\cr
t\gamma\beta↓4\beta↓3 B↓{02}\cr
&B↓{02}→\beta↓2B↓{01}&A↓1→A↓2\beta↓2\cr
t\gamma\beta↓4\beta↓3\beta↓2 B↓{01}\cr
&B↓{01}→\beta↓1B↓{00}&A↓0→A↓1\beta↓1\cr
t\gamma\beta↓4\beta↓3\beta↓2\beta↓1 B↓{00}\cr
&B↓{00}→\Lambda\cr
t\gamma\beta↓4\beta↓3\beta↓2\beta↓1\cr}}$$
But the productions of $G'$ we have used imply the existence of the
productions of~$G$, above, that derive the same string. Again by the
partition theorem and course-of-values induction, $A↓i\aRag u$. In particular,
$S\aRag u$ iff $S\aRagp u$, so the languages are the same.

In the above construction, all the strings called~$\beta$ are in~$V↑+$, so all
productions of~$G'$ are of the forms:
$$\eqalign{B↓{ij}&→A↓k\,\alpha↓1|t\alpha↓1\cr
A↓k&→t\,\alpha↓2\cr
B↓{ii}&→\Lambda\cr}$$
and eliminating $\Lambda$ as a right side leaves the forms
$$\eqalign{B↓{ij}&→A↓k\,\alpha↓1|t\alpha↓1\cr
A↓k&→t\,\alpha↓2\,.\cr}$$
Eliminating $A$'s as the first character on the right, we get forms
$$\eqalign{B↓{ij}&→t\,\alpha↓2\,\alpha↓1\cr
A↓k&→t\,\alpha↓2\cr}$$
where each right side begins with a terminal symbol. This is the major
result of Greibach's theorem, the rest of which is a drill:

\proclaim Theorem. For every CFL, there is a CFG in which 
$P\subseteq V\times (T\otimes V↑{\ast})$. 

We have already shown
$P\subseteq V\times (T\otimes\Sigma↑{\ast})$.
\bye